The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.
translated by 谷歌翻译
Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360{\deg} video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR.
translated by 谷歌翻译
使用人工智能(AI)赋予无线网络中数据量的前所未有的数据量激增,为提供无处不在的数据驱动智能服务而开辟了新的视野。通过集中收集数据集和培训模型来实现传统的云彩中心学习(ML)基础的服务。然而,这种传统的训练技术包括两个挑战:(i)由于数据通信增加而导致的高通信和能源成本,(ii)通过允许不受信任的各方利用这些信息来威胁数据隐私。最近,鉴于这些限制,一种新兴的新兴技术,包括联合学习(FL),以使ML带到无线网络的边缘。通过以分布式方式培训全局模型,可以通过FL Server策划的全局模型来提取数据孤岛的好处。 FL利用分散的数据集和参与客户的计算资源,在不影响数据隐私的情况下开发广义ML模型。在本文中,我们介绍了对FL的基本面和能够实现技术的全面调查。此外,提出了一个广泛的研究,详细说明了无线网络中的流体的各种应用,并突出了他们的挑战和局限性。进一步探索了FL的疗效,其新兴的前瞻性超出了第五代(B5G)和第六代(6G)通信系统。本调查的目的是在关键的无线技术中概述了流动的技术,这些技术将作为建立对该主题的坚定了解的基础。最后,我们向未来的研究方向提供前进的道路。
translated by 谷歌翻译
尽管人工智能为车载设备提供了各种复杂的功能,但是为连接和自主车辆(CAVS)的安全驾驶安全驾驶仍然是广泛的担忧。此外,不同的恶意网络攻击与全球车辆互联网的实施以及互联网的实施方向,这暴露了一系列可靠性和隐私威胁,用于管理CAV网络中的数据。结合现有CAV在处理密集计算任务中的能力是有限的,这意味着需要设计有效的评估系统,以保证在不影响数据安全性的情况下保证自动驾驶安全性。在这篇文章中,我们提出了一种新颖的框架,即支持区块链的智能安全驾驶评估(最好),提供了一种智能且可靠的方法,用于在保护车辆信息时进行安全驾驶监督。具体地,引入了利用长短期存储器模型的有希望的解决方案来评估移动脉冲的安全水平。然后,我们研究了分布式区块链如何通过采用基于拜占庭的容错的委托持有的持代理机制来获得足够的可信度和鲁棒性。仿真结果表明,与现有方案相比,我们所提出的最佳预测准确性更高的预测准确性。最后,我们讨论了未来CAV网络中需要解决的几个开放挑战。
translated by 谷歌翻译
We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and its local languages. Our work is intended to help advance natural language processing research in under-represented languages.
translated by 谷歌翻译
Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. \emph{Anticancer peptides} (ACPs) are the most promising treatment option, but their large-scale identification and synthesis require reliable prediction methods, which is still a problem. In this paper, we present an intuitive classification strategy that differs from the traditional \emph{black box} method and is based on the well-known statistical theory of \emph{sparse-representation classification} (SRC). Specifically, we create over-complete dictionary matrices by embedding the \emph{composition of the K-spaced amino acid pairs} (CKSAAP). Unlike the traditional SRC frameworks, we use an efficient \emph{matching pursuit} solver instead of the computationally expensive \emph{basis pursuit} solver in this strategy. Furthermore, the \emph{kernel principal component analysis} (KPCA) is employed to cope with non-linearity and dimension reduction of the feature space whereas the \emph{synthetic minority oversampling technique} (SMOTE) is used to balance the dictionary. The proposed method is evaluated on two benchmark datasets for well-known statistical parameters and is found to outperform the existing methods. The results show the highest sensitivity with the most balanced accuracy, which might be beneficial in understanding structural and chemical aspects and developing new ACPs. The Google-Colab implementation of the proposed method is available at the author's GitHub page (\href{https://github.com/ehtisham-Fazal/ACP-Kernel-SRC}{https://github.com/ehtisham-fazal/ACP-Kernel-SRC}).
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Neuromorphic vision or event vision is an advanced vision technology, where in contrast to the visible camera that outputs pixels, the event vision generates neuromorphic events every time there is a brightness change which exceeds a specific threshold in the field of view (FOV). This study focuses on leveraging neuromorphic event data for roadside object detection. This is a proof of concept towards building artificial intelligence (AI) based pipelines which can be used for forward perception systems for advanced vehicular applications. The focus is on building efficient state-of-the-art object detection networks with better inference results for fast-moving forward perception using an event camera. In this article, the event-simulated A2D2 dataset is manually annotated and trained on two different YOLOv5 networks (small and large variants). To further assess its robustness, single model testing and ensemble model testing are carried out.
translated by 谷歌翻译
Recent works have shown that unstructured text (documents) from online sources can serve as useful auxiliary information for zero-shot image classification. However, these methods require access to a high-quality source like Wikipedia and are limited to a single source of information. Large Language Models (LLM) trained on web-scale text show impressive abilities to repurpose their learned knowledge for a multitude of tasks. In this work, we provide a novel perspective on using an LLM to provide text supervision for a zero-shot image classification model. The LLM is provided with a few text descriptions from different annotators as examples. The LLM is conditioned on these examples to generate multiple text descriptions for each class(referred to as views). Our proposed model, I2MVFormer, learns multi-view semantic embeddings for zero-shot image classification with these class views. We show that each text view of a class provides complementary information allowing a model to learn a highly discriminative class embedding. Moreover, we show that I2MVFormer is better at consuming the multi-view text supervision from LLM compared to baseline models. I2MVFormer establishes a new state-of-the-art on three public benchmark datasets for zero-shot image classification with unsupervised semantic embeddings.
translated by 谷歌翻译
In this paper, deep-learning-based approaches namely fine-tuning of pretrained convolutional neural networks (VGG16 and VGG19), and end-to-end training of a developed CNN model, have been used in order to classify X-Ray images into four different classes that include COVID-19, normal, opacity and pneumonia cases. A dataset containing more than 20,000 X-ray scans was retrieved from Kaggle and used in this experiment. A two-stage classification approach was implemented to be compared to the one-shot classification approach. Our hypothesis was that a two-stage model will be able to achieve better performance than a one-shot model. Our results show otherwise as VGG16 achieved 95% accuracy using one-shot approach over 5-fold of training. Future work will focus on a more robust implementation of the two-stage classification model Covid-TSC. The main improvement will be allowing data to flow from the output of stage-1 to the input of stage-2, where stage-1 and stage-2 models are VGG16 models fine-tuned on the Covid-19 dataset.
translated by 谷歌翻译